AITopics | vision and language task

Collaborating Authors

vision and language task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

c74d97b01eae257e44aa9d5bade97baf-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-14-2026, 02:06:16 GMT

ablation, pretrain, vision and language task, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.32)
Information Technology > Artificial Intelligence > Natural Language (0.31)

Add feedback

clear / well-organized [ R1 ]; our approach "very interesting " [ R3 ] and novel [ R2 R3 ]; our results significant and

Neural Information Processing SystemsAug-20-2025, 01:49:52 GMT

We thank the reviewers for the thoughtful feedback! We respond to select comments below but will address all feedback. We investigate the RefCOCO+ task. We will perform more task specific task in supplementary. VCR extends to answer justifications like "[Person3] is delivering These ablations are valuable and will be added to the paper.

ablation, pretrain, vision and language task, (3 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.41)

Technology:

Information Technology > Artificial Intelligence > Vision (0.32)
Information Technology > Artificial Intelligence > Natural Language (0.31)

Add feedback

Multimodal Deep Learning

Akkus, Cem, Chu, Luyang, Djakovic, Vladana, Jauch-Walser, Steffen, Koch, Philipp, Loss, Giacomo, Marquardt, Christopher, Moldovan, Marco, Sauter, Nadja, Schneider, Maximilian, Schulte, Rickmer, Urbanczyk, Karol, Goschenhofer, Jann, Heumann, Christian, Hvingelby, Rasmus, Schalk, Daniel, Aßenmacher, Matthias

arXiv.org Artificial IntelligenceJan-12-2023

FIGURE 1: LMU seal (left) style-transferred to Van Gogh's Sunflower painting (center) and blended with the prompt - Van Gogh, sunflowers - via CLIP+VGAN (right). In the last few years, there have been several breakthroughs in the methodologies used in Natural Language Processing (NLP) as well as Computer Vision (CV). Beyond these improvements on single-modality models, large-scale multimodal approaches have become a very active area of research. In this seminar, we reviewed these approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art approaches in the two subfields of Deep Learning individually. Further, modeling frameworks are discussed where one modality is transformed into the other Chapter 3.1 and Chapter 3.2), as well as models in which one modality is utilized to enhance representation learning for the other (Chapter 3.3 and Chapter 3.4). To conclude the second part, architectures with a focus on handling both modalities simultaneously are introduced (Chapter 3.5). Finally, we also cover other modalities (Chapter 4.1 and Chapter 4.2) as well as general-purpose multi-modal models (Chapter 4.3), which are able to handle different tasks on different modalities within one unified architecture.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2301.04856

Country:

North America > Canada > Ontario > Toronto (0.13)
North America > Canada > Newfoundland and Labrador > Labrador (0.04)
Asia > Middle East > Jordan (0.04)
(7 more...)

Genre:

Summary/Review (1.00)
Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Law (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(4 more...)

Add feedback